Web corpora for bilingual lexicography . A pilot study of English / French collocation extraction and translation
نویسندگان
چکیده
منابع مشابه
Automatic Extraction of English Collocations and their Chinese - English Bilingual Examples : A Computational Tool for Bilingual Lexicography
This paper describes the procedures involved in developing EXEC, a web-based system which can automatically extract English collocations and their Chinese-English bilingual examples from parallel corpora. The system draws on statistics, dependency parsing, and Chinese-English parallel corpora of more than 13 million English words and 27 million Chinese characters. By taking a word as well as th...
متن کاملA Corpus-Based Study of zunshou and Its English Equivalents
This paper describes a corpus-based contrastive study of collocation in English and Chinese. In light of the corpus-based approach to identify functionally equivalent units, the present paper attempts to identify the collocational translation equivalents of zunshou by using a parallel corpus and two comparable corpora. This study shows that more often than not, we can find in English more than ...
متن کاملPilot Implementation Of A Bilingual Knowledge Bank
A Bilingual Knowledge Bank is a syntactically and referentially structured pair of corpora, one being a translation of the other, in which translation units are cross-codexl between the corpora. A pilot implementation is described for a corpus of some 20,000 words each in English, French and Esperanto which has been cross-coded between English and Esperanto and between Esperanto and French. The...
متن کاملCollocation Translation Acquisition Using Monolingual Corpora
Collocation translation is important for machine translation and many other NLP tasks. Unlike previous methods using bilingual parallel corpora, this paper presents a new method for acquiring collocation translations by making use of monolingual corpora and linguistic knowledge. First, dependency triples are extracted from Chinese and English corpora with dependency parsers. Then, a dependency ...
متن کاملIdentifying Word Correspondences in Parallel Texts
Researchers in both machine translation (e.g., Brown et a/, 1990) arm bilingual lexicography (e.g., Klavans and Tzoukermarm, 1990) have recently become interested in studying parallel texts (also known as bilingual corpora), bodies of text such as the Canadian Hansards (parliamentary debates) which are available in multiple languages (such as French and English). Much of the current excitement ...
متن کامل